Augmented context features for Arabic speech recognition
نویسندگان
چکیده
We investigate different types of features for language modeling in Arabic automatic speech recognition. While much effort in language modeling research has been directed at designing better models or smoothing techniques for n-gram language models, in this paper we take the approach of augmenting the context in the n-gram model with different sources of information. We start by adding word class labels to the context. The word classes are automatically derived from un-annotated training data. As a contrast, we also experiment with POS tags which require a tagger trained on annotated data. An amalgam of these two methods uses class labels defined on word and POS tag combinations. Other context features include super-tags derived from the syntactic tree structure as well as semantic features derived from PropBank. Experiments on the DARPA GALE Arabic speech recognition task show that augmented context features often improve both perplexity and word error rate.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کامل